Large Margin Winnow Methods for Text Categorization

نویسنده

  • Tong Zhang
چکیده

ABSTRACT The SNoW (Sparse Network of Winnows) ar hite ture has re ently been su essful applied to a number of natural language pro essing (NLP) problems. In this paper, we propose large margin versions of the Winnow algorithms, whi h we argue an potentially enhan e the performan e of basi Winnows (and hen e the SNoW ar hite ture). We demonstrate that the resulting methods a hieve performan e omparable with support ve tor ma hines for text ategorization appliations. We also explain why both large margin Winnows and SVM an be suitable for NLP tasks.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Mistake-driven Learning with Thesaurus for Text Categorization

This paper extends the mistake-driven learner WINNOW to better utilize thesauri for text categorization. In our method not only words but also semantic categories given by the thesaurus are used as features in a classier. New ltering and disambiguation methods are used as pre-processing to solve the problems caused by the use of the thesaurus. In order to verify our methods, we test a large bod...

متن کامل

On the Importance of Parameter Tuning in Text Categorization

Text Categorization algorithms have a large number of parameters that determine their behaviour, whose effect is not easily predicted objectively or intuitively and may very well depend on the corpus or on the document representation. Their values are usually taken over from previously published results, which may lead to less than optimal accuracy in experimenting on particular corpora. In thi...

متن کامل

Combining Winnow and Orthogonal Sparse Bigrams for Incremental Spam Filtering

Spam filtering is a text categorization task that has attracted significant attention due to the increasingly huge amounts of junk email on the Internet. While current best-practice systems use Naive Bayes filtering and other probabilistic methods, we propose using a statistical, but non-probabilistic classifier based on the Winnow algorithm. The feature space considered by most current methods...

متن کامل

Automatic Categorization of Email into Folders: Benchmark Experiments on Enron and SRI Corpora

Office workers everywhere are drowning in email—not only spam, but also large quantities of legitimate email to be read and organized for browsing. Although there have been extensive investigations of automatic document categorization, email gives rise to a number of unique challenges, and there has been relatively little study of classifying email into folders. This paper presents an extensive...

متن کامل

Large margin multinomial mixture model for text categorization

In this paper, we present a novel discriminative training method for multinomial mixture models (MMM) in text categorization based on the principle of large margin. Under some approximation and relaxation conditions, large margin estimation (LME) of MMMs can be formulated as linear programming (LP) problems, which can be efficiently and reliably solved by many general optimization tools even fo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000